Using Synonym Relations in Chinese Collocation Extraction
نویسندگان
چکیده
A challenging task in Chinese collocation extraction is to improve both the precision and recall rate. Most lexical statistical methods including Xtract face the problem of unable to extract collocations with lower frequencies than a given threshold. This paper presents a method where HowNet is used to find synonyms using a similarity function. Based on such synonym information, we have successfully extracted synonymous collocations which normally cannot be extracted using the lexical statistical approach. We applied synonyms mapping to each headword to extract more synonymous word bi-grams. Our evaluation over 60MB tagged corpus shows that we can extract synonymous collocations that occur with very low frequency, sometimes even for collocations that occur only once in the training set. Comparing to a collocation extraction system based on Xtract, we have reached the precision rate of 43% on word bi-grams for a set of 9 headwords, almost 50% improvement from precision rate of 30% in the Xtract system. Furthermore, it improves the recall rate of word bi-gram collocation extraction by 30%.
منابع مشابه
Similarity Based Chinese Synonym Collocation Extraction
Collocation extraction systems based on pure statistical methods suffer from two major problems. The first problem is their relatively low precision and recall rates. The second problem is their difficulty in dealing with sparse collocations. In order to improve performance, both statistical and lexicographic approaches should be considered. This paper presents a new method to extract synonymou...
متن کاملArabic Collocation Extraction Based on Hybrid Methods
Collocation Extraction plays an important role in machine translation, information retrieval, secondary language learning, etc., and has obtained significant achievements in other languages, e.g. English and Chinese. There are some studies for Arabic collocation extraction using POS annotation to extract Arabic collocation. We used a hybrid method that included POS patterns and syntactic depend...
متن کاملA Hybrid Extraction Model for Chinese Noun/Verb Synonym bi-gram Collocations
Statistical-based collocation extraction approaches suffer from (1) low precision rate because high co-occurrence bi-grams may be syntactically unrelated and are thus not true collocations; (2) low recall rate because some true collocations with low occurrences cannot be identified successfully by statistical-based models. To integrate both syntactic rules as well as semantic knowledge into a s...
متن کاملConstruction of Semantic Collocation Bank Based on Semantic Dependency Parsing
Collocation has always been an important issue in language research, especially in Chinese language researches. Chinese is an isolated language, which lacks morphological changes.Establishing a relatively complete dictionary of Chinese collocation will be a great contribution to Chinese study and research. Collocation plays a significant supporting role in many fields of NLP, such as informatio...
متن کاملUsing Collocation Statistics in Information Extraction
Our main objective in participating MUC-7 is to investigate and experiment with the use of collocation statistics in information extraction. A collocation is a habitual word combination, such as \weather a storm", \ le a lawsuit", and \the falling yen". Collocation statistics refers to the frequency counts of the collocational relations extracted from a parsed corpus. For example, out of 6577 i...
متن کامل